Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining

نویسندگان

  • Kavita Patel
  • Soma Halder
  • Richa Tiwari
  • Alan Sprague
  • Marios Kokkodis
  • Ting-Kai Huang
  • Anthony Skjellum
چکیده

This paper attempts to develop an algorithm to recognize spam domains using data mining techniques with the focus on law enforcement forensic analysis. Spam filtering has been the major weapon against spam, but failed to reduce the number of spam emails sent to an indiscriminate set of recipients. The proposed algorithm accepts as input, spam mails of personal account and extracts features such as stylistic, semantic, related email subjects and URLs present in the emails. The individual features are then clustered and evaluated. Further, these clusters are mapped with their respective domains. These spam domains are the URL of the webpage that spammer is trying to promote. The WHOIS information of the domain helps to get information about the source of that domain. Parameters like overall purity and the number of emails present in the cluster with highest purity is used to measure result of the individual features. An Experimental result shows that clustering of spam mails by stylistic and semantic parameter 20% less pure than other two features of spam mails.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه روشی مناسب برای دسته بندی نامه های الکترونیکی تبلیغاتی بر مبنای پروفایل کاربران

In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising emails become spam for some users, and not spam for others. To deal with this problem, many researchers design an anti-s...

متن کامل

A Critical Analysis of Financial Fraud Spam in English in Terms of Persuasive Strategies: Personalization, Presupposition, and Lexical Choices

The term ‘spam’ addresses unsolicited emails sent in bulk; therefore, the term‘financial fraud spam’ refers to unwanted bulk emails in which different tricks and techniques areemployed to swindle money from the recipients. Estimates show that more than 80% of worldwideemail traffic in 2011 was spam. It should be noted that while the number of daily spam emails in2002 was 2.4 billion, this numbe...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Clustering Spam Domains and Destination Websites: Digital Forensics with Data Mining

Spam related cyber crimes have become a serious threat to society. Current spam research mainly aims to detect spam more effectively. We believe the identification and disruption of the supporting infrastructure used by spammers is a more effective way of stopping spam than filtering. The termination of spam hosts will greatly reduce the profit a spammer can generate and thwart his ability to s...

متن کامل

A New Model for Email Spam Detection using Hybrid of Magnetic Optimization Algorithm with Harmony Search Algorithm

Unfortunately, among internet services, users are faced with several unwanted messages that are not even related to their interests and scope, and they contain advertising or even malicious content. Spam email contains a huge collection of infected and malicious advertising emails that harms data destroying and stealing personal information for malicious purposes. In most cases, spam emails con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014